Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Speech separation algorithm based on convolutional encoder decoder and gated recurrent unit
CHEN Xiukai, LU Zhihua, ZHOU Yu
Journal of Computer Applications    2020, 40 (7): 2137-2141.   DOI: 10.11772/j.issn.1001-9081.2019111968
Abstract338)      PDF (830KB)(544)       Save
In most speech separation and speech enhancement algorithms based on deep learning, the spectrum feature after Fourier transform is used as the input feature of the neural network, without considering the phase information in the speech signal. However, some previous studies show that phase information is essential to improve speech quality, especially at low Signal-to-Noise Ratio (SNR). To solve this problem, a speech separation algorithm based on Convolutional Encoder Decoder network and Gated Recurrent Unit (CED-GRU) network was proposed. Firstly, based on the characteristic that the original waveform contains both amplitude information and phase information, the original waveform of the mixed speech signal was used as the input feature. Secondly, the timing problem in speech signal was able to be effectively solved by combining the Convolutional Encoder Decoder (CED) network and the Gated Recurrent Unit (GRU) network. Compared with Permutation Invariant Training (PIT) algorithm, DC (Deep Clustering) algorithm, Deep Attractor Network (DAN) algorithm, the improved algorithm has the Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) of men and men, men and women, women and women increased by 1.16 and 0.29, 1.37 and 0.27, 1.08 and 0.3; 0.87 and 0.21, 1.11 and 0.22, 0.81 and 0.24; 0.64 and 0.24, 1.01 and 0.34, 0.73 and 0.29 percentage points. The experimental results show that the speech separation system based on CED-GRU has great value in practical application.
Reference | Related Articles | Metrics
Indoor speech separation and sound source localization system based on dual-microphone
CHEN Binjie, LU Zhihua, ZHOU Yu, YE Qingwei
Journal of Computer Applications    2018, 38 (12): 3643-3648.   DOI: 10.11772/j.issn.1001-9081.2018040874
Abstract753)      PDF (866KB)(452)       Save
In order to explore the possibility of using two microphones for separation and locating of multiple sound sources in a two-dimensional plane, an indoor voice separation and sound source localization system based on dual-microphone was proposed. According to the signal collected by microphones, a dual-microphone time delay-attenuation model was established. Then, Degenerte Unmixing Estimation Technique (DUET) algorithm was used to estimate the delay-attenuation parameters of model, and the parameter histogram was drawn. In the speech separation stage, Binary Time-Frequency Masking (BTFM) was established. According to the parameter histogram, binary masking method was combined to separate the mixed speech. In the sound source localization stage, the mathematical equations for determining the location of sound source were obtained by deducing the relationship between the model attenuation parameters and the signal energy ratio. Roomsimove toolbox was used to simulate the indoor acoustic environment. Through Matlab simulation and geometric coordinate calculation, the locating in the two-dimensional plane was completed while separating multiple targets of sound source. The experimental results show that, the locating errors of the proposed system for multiple signals of sound source are less than 2%. Therefore, it contributes to the research and development of small system.
Reference | Related Articles | Metrics